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ABSTRACT 

The goal of this study was to investigate whether 
expert physicians 1 knowledge can be represented in the form of 
illness scripts. The idea of "scripts" was introduced by Schank and 
Abelson (1977) to explain why people are able to bring to bear 
enormous amounts of knowledge almost effortlessly in practical 
real-life situations. Previous script-related research has revealed 
that recognition memory discrimination for typical script items is 
generally poor. An experiment was designed to investigate whether 
this result would also apply to illness scripts, and whether level of 
expertise would influence recognition memory for illness script 
items. Though a significant interaction between typicality and 
textual presence of items was found for experienced physicians (n=23) 
but not for fourth-year medical students (n=22) , no clear 
developmental trend could be discerned; the intermediate group of 
sixth-year medical students (n=20) appeared to have a more accurate 
recognition memory than either the experts or the novices. The 
results are discussed with regard to the development of illness 
scripts. One table and two figures present study findings. An 
appendix contains a case example. (Contains 20 references.) (SLD) 
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Abstract 



The goal of the present study was to investigate whether expert 
physicians' knowledge can be represented as illness scripts. Previous script- 
related research has revealed that recognition memory discrimination for 
script typical items is generally poor. An experiment was designed to 
inves' *a< ; whether this result would also apply to illness scripts, and whether 
level e pertise would influence recognition memory for illness script items. 
Thou t . a significant interaction between typicality and textual presence of 
items was found for experienced physicians but not for 4th-year students, no 
clear developmental trend could be discerned: the intermediate group (i.e., 
6th-year students) appeared to have a more accurate recognition memory than 
either the experts or the novices. The results are discussed with regard to 
the development of illness scripts. 
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The representation of knowledge in general, and medical expert 
knowledge in particular, is a rather controversial issue. Part of this 
controversy arises from the fact that for different purposes it is useful to 
represent medical knowledge at different levels of description. Though few 
would disagree that neurons are somehow involved in human medical 
cognition, it is also obvious that the possibilities to describe and explain 
expert behavior at a neural level are still rather remote at the moment. 
Descriptions of expertise at a much higher level, for example in terms of 
experience, insight, or pattern recognition, may cover any expert behavior, 
but at the expense of offering no guidelines at all about how to improve 
expertise or to optimize its development by designing instruction or 
education. Therefore, most researchers have tried to apply psychological 
theories of an intermediate level to human expert behavior. In the 1960s and 
70s, these theories heavily emphasized the distinction between the 
knowledge base and problem solving methods (e.g., Newell & Simon, 1972). 
As by the end of the 1970s it became clear that expert physicians could not be 
distinguished from nonexpert physicians on formal aspects of the problem 
solving process (e.g., Elstein, Shulman, & Sprafka, 1978), and that the 
essence of expertise is inherent in the structure of expert knowledge, 
theoretical interest shifted toward the way knowledge can be represented. 
Psychological theories that address this issue of knowledge representation, 
and therefore can be applied to expert medical cognition, are, among others, 
ACT* (Anderson, 1983), SOAR (Newell, 1990), the script theory (Schank & 
Abelson, 1977) and mental models (Johnson- Laird, 1983). 

The present research is based on the assumption that expert medical 
knowledge, at least the clinical part of it, can be represented as a large set of 
illness scripts. The idea of "scripts" was introduced by Schank and Abelson 
(1977) to explain why people are able to bring to bear enormous amounts of 
knowledge almost effortlessly in practical real-life situations. A script is a 
cognitive structure that refers to a body of knowledge associated to a sequence 
of events that occurs frequently in a specific order (Fayol & Monteil, 1988; 
Schank & Abelson, 1977). As scripts guide inferencing during 
comprehension, their structure can be conceived as a set of concepts 
interrelated by firmly established excitatory links, with inhibiting connections 
to concepts that do not fit script-based expectancies. Thus, as a consequence 
of script activation, a whole set of concepts becomes automatically activated, 
even if no specific information about the individual members of this set is 
available yet. 

In 1984, Feltovich and Barrows applied the notion of script to the medical 
domain*. However, from their point of view, illness scripts are structures 



more like mental models than like the original Schank and Abelson scripts. 
For example, they define illness scripts as representations that need to be 
constructed for each patient on basis of biomedical knowledge. But 
subsequent research has revealed that expert physicians do not seem to rely 
that much on biomedical knowledge (e.g., Boshuizen & Schmidt, 1992). 
However, an important asset of the Feltovich and Barrows' illness script was 
the distinction between Enabling Conditions (contextual and patient-related 
factors that influence the probability that someone gets a disease) and 
Consequences (complaints, signs, and symptoms of a disease). Experts' illness 
scripts are ready-made packages of knowledge about Enabling Conditions and 
Consequences of diseases; these packages can be activated quickly in practical 
situations: thus, they can probably be described as script-like structures. 

Evidently, it is not easy to prove that knowledge is represented in the 
form of scripts. However, scrirt theory and script related research have 
generated some predictions that have received empirical support. For 
example, if a script is activated, e.g., by reading a text, the typical concepts 
associated with that particular script are also activated automatically. This 
activation is independent of the actual appearance of those concepts in the 
text. Consequently, it has been found that items or concepts that are very 
central or typical with respect to an activated script, but are never explicitly 
mentioned, have a rather high chance of being falsely recognized in a 
recognition test (e.g., Maki, 1990; Smith & Graesser, 1981; Walker & 
Yekovich, 1984). In this case, it is difficult for subjects to determine whether 
the activation of a concept is due to its actual presentation in a text or 
whether it is only implicitly inferred, i.e., activated by links with other, 
actually stated concepts. Atypical items, on the other hand, can only be 
activated by reading them in the text and tagging them to the activated script, 
as this atypical information does not fit in a specific script slot (Graesser, 
Woll, Kowalski, & Smith, 1980; Graesser & Nakamura, 1982). Thus, 
recognition decisions for atypical items can be made quickly: all the subject 
has to do is to check whether there is a tag in memory for that item. Typical 
or centra concepts, on the other hand, usually receive no specific tag in 
memory, even if they are explicitly mentioned. Therefore, memory 
discrimination is often reported as better for atypical, explicitly stated items 
than for typical, explicitly stated items (cf. Bellezza & Bower, 1981; Smith & 
Graesser, 1981; Yekovich & Walker, 1986). 

The present study was designed to investigate whether these 
characteristics of "real-life scripts" also apply to illness scripts. For example, 
would subjects be inclined to erroneously recognize patient characteristics or 
symptoms that might be expected given a particular disease, but are in fact 
not mentioned, or mentioned in other wordings, in a case? Arkes and 



Harkness (1980) found evidence for false recognition of disease consistent but 
unstated symptoms, provided that the diagnosis was known to the subjects. 
Another question is whether recognition measures might reveal differences 
between the scripts of experienced physicians and those of medical students. 
As yet, research on this topic suggests that advanced students* illness scripts 
are more diffuse, with links between concepts less well established, and with 
the appropriate values to fill in slots less well circumscribed, than physicians' 
illness scripts (Custers, Boshuizen, & Schmidt, 1992; Custers et al., in 
preparation). Thus, it might be expected that the predicted effects of 
typicality and textual presence of information on recognition measures are 
less conspicuous for nonexpert subjects. Consequently, in an actual 
recognition experiment, students would show relatively good recognition 
memory for unstated, but script-prototypical statements, while experienced 
physicians would make relatively many errors on this type of items. 

An experiment was designed to test these hypotheses. Case descriptions 
were presented to subjects of different expertise levels, followed by a set of 
recognition statements. These recognition statements had either been 
literally presented in the case or not, and they could be prototypical or 
atypical for the particular disease. Thus, typicality and textual presence were 
independently manipulated. The influence of this experimental manipulation 
on recognition measures was investigated. It was predicted that an 
interaction would be found between typicality and textual presence, with 
unstated prototypical statements showing comparatively high false alarm rates. 
In addition, a three-way interaction between expertise level, typicality and 
textual presence would be expected, with experienced subjects generally 
showing particularly poor memory discrimination for unstated, prototypical 
items. 



METHOD 

Subjects 

Subjects were 22 fourth-year students, 20 sixth-year students, and 23 
experienced family physicians. The fortn-year students had followed a four 
year curriculum of theoretical and practical medical education, but they had 
virtually no clinical experience. They were tested at the end of the term. The 
sixth-year students had completed at least 75% of their clerkships, and 
therefore had walked the wards for 16 months or more. All student subjects 
were from University of Limburg at Maastricht, The Netherlands. 

The .experienced physicians were recruited from general practitioners in 



the Maastricht area. They had on the average 16.25 years experience as 
practicing family physicians, ranging from 5.75 years to 41 years. 

Material 

From a set of 24 diseases used in previous research (Custers, Boshuizen, & 
Schmidt, 1993), nine were selected to be included in the present study. 
These diseases were: aneurysm of the aortic artery, herpes zoster, nerv^-s 
abdominal pain, dermatitis peri-oralis, pre-infarct syndrome, vaginal 
candidiosis, epidural hematoma, kidney stones colic, and carcinoma of the 
head of the pancreas. Based on these afflictions, computerized case 
descriptions were constructed. Each case consisted of a number of 
statements, ranging from 15 to 24, which provided information about the 
patient's context and background, the setting (e.g., consultation hour, 
emergency telephone call, house call), the main complaints, and some 
symptoms. Though each case described a quite "textbook-like" patient, it also 
included some atypical patient characteristics and symptoms, while on the 
other hand some highly typical aspects of the disease were deliberately 
omitted. Appendix A shows an example of a case description. 

For each case, a set of ten test statements was constructed. Five of these 
statements were exact copies of statements that appeared in the case 
description. The remaining five statements differed, at least as far as the 
wording concerned, substantially from any statement that appeared in the 
case. These types of statements will be called "stated" and "unstated", 
respectively. Both categories of statements were further divided into three 
prototypical items and two atypical items. The prototypical items contained 
information that was completely typical for the disease the case description 
was based upon, information that could be either stated or unstated. The 
atypical items, on the other hand, contained information that was not very 
typical for the disease in question. Thus, for example, atypical-unstated items 
were statements about patient contextual factors or disease characteristics 
that were neither very typical for the disease in question, nor literally 
mentioned in the case presentation. Figure 1 shows the organization of the 
test statements. Thus, in all there were three prototypical-literally stated 
items, three prototypical- unstated items, two atypical-literally stated items, 
and two atypical-unstated items; and these ten items were divided over four 
Enabling Conditions and six Consequences. Appendix A shows the test 
statements that were presented for the kidney stones colic case. 
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Figure 1. Overview of the ten test statements for each case 



prototypical 



atypical 



stated 



1 Enabling Condition 
2 Consequences 



1 Enabling Condition 
1 Consequence 



unstated 



1 Enabling Condition 
2 Consequences 



1 Enabling Condition 
1 Consequence 



Procedure 

Subjects were tested individually, the students at the university 
department, the family physicians in their own office. Each experimental 
session consisted of a study task, an interim task and a test task. Both the 
study task and the test task were presented on a Macintosh Plus computer 
screen, and written in and controlled by Authorware. The interim task was 
not related to any of the other task; its role was simply to clear the subject's 
short-term memory. 

After a short general introduction, the study task was started. The nine 
case descriptions were presented successively. Before each case was started, 
the name of the disease associated to that case was displayed on the screen. 
Subjects were encouraged to ask questions if they did not know the 
announced disease, or were in doubt about any aspect of it. If everything was 
clear, they could start the case presentation by pressing a button on the 
keyboard. Upon starting the case, the statements successively appeared on 
the screen. Each statement remained visible for a fixed time. This display 
duration was determined by the formula: 



In previous research, it was found that subjects process this type of 
statements at a rate of approximately 35 milliseconds per text character. For 
purposes of the present study, we took a base rate of 1500 milliseconds per 
statement, supplemented with 35 milliseconds for each text character. The 
base rate of 1500 milliseconds was determined empirically in a pilot study, 



t=[1500 + 35 * number of text characters! msec 
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and it resulted in a display time for each statement that was neither too short 
for the inexperienced subjects, nor too long for the expert physicians to 
process the presented information. Consequently, it might be assumed that 
every subject had sufficient time to read and comprehend each statement, but 
not enough time to memorize it. 

Subjects were instructed to read each case as attentively as possible and 
to try to assimilate as much of the presented information as they could. All 
nine cases were presented successively; after completing each case 
presentation, an opportunity for remarks, questions or a sho^t pause was 
provided. Subjects were not allowed to write anything down. Although the 
experimenter announced in the introduction that afterwards a task, based on 
the present cases, would be presented, the true nature of this task was of 
course not revealed in advance to the subjects. 

After finishing the study phase a short interim task was presented, in 
which subjects were asked to tell something about medical journals they were 
familiar with (e.g., their content, the quality of the articles, their practical 
utility, the subject's preferences). The duration of this task was about 2-3 
minutes. 

Next, the test task was administered. For each case, subjects were shown 
the ten test statements, one by one. The set of test statements associated to 
each individual case was always presented as a block, with the individual 
statements appearing in an order that was randomly determined, but fixed in 
advance. Subjects were instructed to decide as accurately and as quickly as 
possible for each individual statement whether it had been literally presented 
in the original case presentation or not. In order to answer this question, a 
press-button device, until that moment carefully hidden from subjects 1 view, 
was connected to the computer. This device contained two buttons, the left 
one labeled "yes", the right one labeled "no". Subjects were instructed to 
press the "yes" button if they judged a particular test item as having been 
literally presented in the case, and "no" if they judged that this had not been 
so. 

It was emphasized that the test statements had either been literally 
presented in the associated case, or were considerably different from any 
statement in any case, at least in wording. However, it was also stressed that a 
particular test statement could be very similar in meaning to an item in the 
case, but that this meaning should not be taken into consideration, as their 
task was only to judge the literal presence of the statements. 

Before each block of statements associated to a case was presented, the 
name of the disease the case was based on was again announced on the 
screen, in order to reinstate the proper illness script context. The sequence 
of the cases in the test session was the same as in the study session. Between 



each block of case test items, also an opportunity for questions or a short 
pause was provided. After the test phase was finished, subjects were 
debriefed and received a small reward for their participation. 



Analysis 

Every time a subject pressed a button, the selected response ("yes" or 
"no") was registered. The first case was a practice case and its results were 
not included in the analysis. For the remaining eight cases, average number of 
"yes" ans wers computed for each of the ten different types of test statements 
(see Appendix A). Thus, this procedure yielded ten "yes/no" measures for 
every subject, based on the eight instances of each test statement type. In 
fact, this measure can also be conceived as ten scores on a nine-point 
recognition scale, each score ranging from zero (a "no" answer for each of the 
eight instances of a given statement type) to eight (eight times a "yes" answer 
for each of the eight instances of a given statement type). However, in order 
to account for the fact that the test task for each case included three 
prototypical statements of both types and only two atypical statements, the 
percentage of "yes" answers for each statement type was used, rather than the 
raw number. Subsequently, these percentage values were analyzed with a 
hree (levels of expertise) by two (prototypical versus atypical) by two (textual 
presence, or: stated versus unstated) analysis of variance, with expertise level 
as between subjects factor and typicality and textual presence as within 
subjects factor. 



RESULTS AND DISCUSSION 

The results showed a significant main effect of expertise level [F(2, 62) = 
5.348, £ < .001, M£ p = 297.510), a significant main effect of typicality [F(l, 
62) = 35.138, £ < .0001, MS e = 66.738], and a significant main effect of 
textual presence [F(l, 62) = 772.052, £ < .0001, MS e = 269.940]. 
Furthermore, significant two-way interactions between textual presence and 
expertise (F(2, 62) = 6.053, £ < .005, MS e = 269.940] and between typicality 
and textual presence [F(l, 62) = 17.183, £ < .0001, MS e = 54.988] were 
found, but no significant interaction between typicality and expertise. Finally, 
the results showed a significant three-way interaction between expertise, 
typicality and textual presence [F(2, 62) = 5.496, £ < .01, MS e = 54.988]. The 
results are depicted in Table 1 . 



From Table 1, it can be read that the main effect of expertise level can be 
accounted for by the 4th-year students generally positively "recognizing" 
relatively more statements than the 6th-year students or the family physicians. 
The main effects of typicality and textual presence are relatively 
straigthforward: over all expertise levels, prototypical statements more often 
receive a "yes" answer than atypical statements (55.58% vs. 49.52%), and 
stated items are of course more often recognized than unstated items 
(80.71% vs. 24.39%). Figure 2 shows the interaction effects. Most interesting 
two-way interaction in light of the illness script theory is the one between 
typicality and textual presence. Generally, i.e. over the three expertise levels, 
explicitly stated items are recognized about equally well, regardless whether 
they are prototypical or atypical, while unstated items are more often falsely 
recognized if they are prototypical than if they are atypical. However, the 
finding of a significant three way interaction between expertise level, 
typicality and textual presence indicates that the interaction between 
typicality and textual presence is not alike for all expertise levels. The illness 
script theory predicts that this latter interaction should be accounted for 
mainly by the results of the family physicians; Figure 2 shows that this 
prediction is borne out: for 4th-year students, the interaction is absent, while 
for family physicians, it is evidently present. Thus, family physicians show a 
relatively stronger inclination to falsely recognize unstated prototypical items, 
compared to unstated atypical items, than subjects at less advanced levels of 
medical expertise, while for stated items, there are no differences between 
the three expertise levels with £ard to their relative recognition scores for 
prototypical versus atypical items. 

These results support to a large extent the hypothesis that illness scripts 
are indeed script-like structures in the sense of Schank and Abelson (1977), 
though not all predictions could be confirmed. Support was found for the 
hypothesis that items in line with a specific illness, but never explicitly 

Table 1. Average percentage of positively recognized statements ("yes" 
answers) for three expertise levels and different types of statements 



stated 



unstated 



Expertise level P A 



P 



A 



4th -year students 87. 12 8 1 .53 
6th-year students 81.46 80.00 
family physicians 76.99 77.45 



34.66 



18.33 
33.88 



28.69 



9.38 
19.29 



Note — P=prototypical, A=atypical 
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presented, show a relatively high probability of being falsely recognized, 
compared to statements that are less likely, given the particular script. 
Generally, atypical information did show somewhat better memory 
discrimination than prototypical information if it was unstated, but not if it 
was stated. This might be a consequence of the nature of the type of 
information: the atypical items could be considered neutral or uninformative, 
rather than salient or contradictory. As a consequence, though actually 
presented, they may have received no specific tag, and thus may not have been 
sufficiently activated. 

It could be argued that the results of the three-way interaction do not 
give an accurate impression of the real differences between the expertise 
groups, as in fact the 4th-year students falsely recognized an about equally 
high percentage of unstated, prototypical items than the family physicians 
(34.66% vs. 33.88%). However, unlike the family physicians, the fourth year 
students showed a high proportion of false recognitions of atypical unstated 
items also (28.69% vs. 19.29%), a finding that would not be expected, should 
their knowledge be structured in illness script format. The tendency of 
recognition accuracy for prototypical, stated items to decrease with increasing 
expertise level (from approximately 87% for 4th-year students to 77% for 
family physicians) might be explained by assuming that reading a prototypical 
statement increases its activation level in less experienced subjects, while this 
level is already elevated to a maximum in experienced subjects as a result of 
the script activation (cf. Graesser & Nakamura, 1982). Consequently, it would 



Figure 2. The effect of typicality and textual presence on positive recognition 
percentages at different levels of expertise. 

% of statements positively recognized 
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be more difficult for experienced subjects to decide whether there had 
indeed been an external activational source — i.e., a text statement presented 
on the screen — for a stated prototypical item, in addition to the "internal" 
script-based activation. 

Inspection of Figure 2 reveals that, like so many expertise research in the 
medical field, the present study also did not escape from finding an 
intermediate effect. For the unstated items, the 6th-year students performed 
better than both the less and the more experienced subjects, for atypical as 
well as for prototypical statements. As this effect was found for unstated, but 
not for stated items, it is difficult to explain: it cannot be accounted for by the 
hypothesis that 6th-year students are generally more accurate, or show better 
memory for the actually presented data, than either 4th-year students or 
expert family physicians. 

In conclusion, evidence was found that some of the characteristic 
recognition features of scripted texts apply to a large extent also to illness 
scripts, and that the results showed a developmental tendency from relative 
novices to relative experts, with the data of the latter group being more in line 
with general script research findings than the data of the former group. 

Finally, although it is difficult to derive recommendations for actual 
medical education from this particular study, we want to emphasize that 
support for t. e notion that medical, especially clinical, expertise can be 
represented as illness scripts, also includes support for the educational 
consequences of this view, as outlined in related, recent work (Custers et al., 
1992; Custers et al., in preparation). Perhaps the most important of these 
recommendations includes that students should be provided with ample 
opportunities to form illness scripts by seeing as many patients as possible, 
especially with irequently occurring diseases, and with an emphasis on an 
accurate representation of the actual normal variation in Enabling Conditions 
as well as Consequences. 
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APPENDIX A 



Example of a case description and the associated test statements 

case: kidney stones colic 

1. Man, aged 47 

2. He is married and has three teenage children 

3. His occupation is store-keeper 

4. At age 30, he was treated for bronchitis 

5. Six years ago, he had his leg broken as a consequence of a car accident 

6. Four years ago, he was treated with medicaments for kidney stones 

7. Some of his relatives are known with coronary diseases and diabetes 
mellitus 

8. His wife rings up, asks the physician for an immediate visit: it's happening 
again 

9. Her man is vomiting almost continuously 

10. He is rolling across the room because of the pain 

1 1 . At the moment the physician arrives, the pain has just subsided 

12. The patient is sitting on the sofa, recovering a bit 

13. He complains about having had a convulsive abdominal pain at the left 
side, abreast of the navel 

14. The pain extends to his groins 

15. The pain emerged all of a sudden, and then gradually subsided 

16. During an attack, he almost can't stand it 

17. Earlier that day, he had already seen some blood in his urine 

18. But he had no pain at that time 

19. His wife says she has measured a temperature of 38.2 degrees Centigrade 

Testitems (the actual order of the items in the test was randomly 
determined): 



Typl 


Pres 2 Script 3 


Item text 


P 


S 


EC 


Man, aged 47 


P 


S 


Con 


He is rolling across the room because of the pain 


P 


S 


Con 


The patient is sitting on the sofa, recovering a bit 


P 


u 


EC 


Four years ago, he had a kidney stone colic 


P 


u 


Con 


The pain radiated 


P 


u 


Con 


In-between the attacks, he doesn't look very ill 


A 


s 


EC 


Six years ago, he had his leg broken as a consequence 
of a car accident 


A 


s 


Con 


Earlier that day, he had already seen some blood in his 
urine 


A 


u 


EC 


He is slightly overweight 


A 


u 


Con 


He has a mild fever 



1 Typ= item typicality (P=proto typical, A=atypical) 

2 Pres= textual presence (S=stated, U=unstated) 

3 Script= script component (EC=Enabling Condition, Con=Consequence) 
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